Generic Loop Parallelization for Reconfigurable Architectures

نویسندگان

Ozana Silvia Dragomir

Koen Bertels

چکیده

Reconfigurable Computing (RC) is one of the most intensively studied research areas nowadays due to its potential to dramatically increase application performance. RC combines a general purpose processor (GPP) and a Field Programmable Gate Array (FPGA), having the advantages of both hardware performance and software flexibility. Modern real-life applications (such as audio, video, image processing, etc) spend most of the execution time in loops, which represent or include the application kernels. These loops are an important source of performance improvement. In our work, we target loops that contain in their bodies code for the GPP (software functions) and also for the FPGA (hardware functions). We assume there are data dependencies between consecutive tasks in the loop body, but not between different loop iterations. Assuming the Molen machine organization as our framework, we focus on applying existing loop optimizations to such loops, with the purpose of parallelizing applications such that multiple kernel instances run in parallel on the reconfigurable hardware, while concurrently executing code on the GPP. In this paper, we focus on loop transformations that are suitable for loops containing an arbitrary number of software and hardware functions. The extended shifting consists of relocating the functions placed in the beginning and in the end of one loop iteration, in order to eliminate the data dependencies and allow certain software and hardware functions to be executed in parallel. The loop distribution consists of splitting the loop into small loops (e.g., with only one kernel) allowing in some cases a larger degree of parallelism when applying the loop unrolling and shifting techniques. We estimate the performance achieved by applying the extended shifting technique in conjunction with loop unrolling and compare it to the performance achieved when applying the loop unrolling and shifting techniques to smaller loops obtained by distributing the original loop. For the experimental results we used randomly generated tests, for loops containing a variable number of kernels (between 2 and 8 kernels).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Loop Parallelization for Reconfigurable Architectures

Reconfigurable Computing (RC) is one of the research directions that focuses on accelerating applications. In the presented approach we assume the Molen machine organization and the Molen programming paradigm as our framework. Molen combines a general purpose processor (GPP) and a Field Programmable Gate Array (FPGA), having the advantages of both speed of hardware and flexibility of software e...

متن کامل

Exploiting loop-level parallelism on coarse-grained reconfigurable architectures using modulo scheduling - Computers and Digital Techniques, IEE Proceedings-

Coarse-grained reconfigurable architectures have become increasingly important in recent years. Automatic design or compilation tools are essential to their success. A modulo scheduling algorithm to exploit loop-level parallelism for coarse-grained reconfigurable architectures is presented. This algorithm is a key part of a dynamically reconfigurable embedded systems compiler (DRESC). It is cap...

متن کامل

Mapping Loops on Coarse-Grain Reconfigurable Architectures Using Memory Operation Sharing

Recently many coarse-grain reconfigurable architectures have emerged as programmable coprocessors, considerably relieving the burden of the main processors in many multimedia applications. While their very high degree of parallelism enables high performance in compute-intensive loops, their shared memory interface between several processing elements often becomes a bottleneck in many multimedia...

متن کامل

Enabling Parallelization via a Reconfigurable Chip Multiprocessor

While reconfigurable computing has traditionally involved attaching a reconfigurable fabric to a single processor core, the prospect of large-scale CMPs calls for a reevaluation of reconfigurable computing from the perspective of multicore architectures. We present ReMAPP, a reconfigurable architecture geared towards application acceleration and parallelization. In ReMAPP, parallel threads shar...

متن کامل

Evaluating Memory Architectures for Media Applications on Coarse-Grained Recon.gurable Architectures

Reconfigurable ALU Array (RAA) architectures—representing a popular class of Coarse-grained Reconfigurable Architectures—are gaining in popularity especially for media applications due to their flexibility, regularity, and efficiency. In such architectures, memory is critical not only for configuration data but also for the heavy data traffic required by the application. Hence, system designers...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2008

Generic Loop Parallelization for Reconfigurable Architectures

نویسندگان

چکیده

منابع مشابه

Loop Parallelization for Reconfigurable Architectures

Exploiting loop-level parallelism on coarse-grained reconfigurable architectures using modulo scheduling - Computers and Digital Techniques, IEE Proceedings-

Mapping Loops on Coarse-Grain Reconfigurable Architectures Using Memory Operation Sharing

Enabling Parallelization via a Reconfigurable Chip Multiprocessor

Evaluating Memory Architectures for Media Applications on Coarse-Grained Recon.gurable Architectures

عنوان ژورنال:

اشتراک گذاری